| variable | mean birdsong | sd birdsong | mean finance | sd finance |
|---|---|---|---|---|
| linearity | -0.047 | 0.785 | 15.385 | 24.845 |
| entropy | 0.838 | 0.226 | 0.526 | 0.408 |
| x_acf1 | 0.204 | 0.598 | 0.492 | 0.686 |
| covariate1 | 3.002 | 0.510 | 3.002 | 0.490 |
| covariate2 | 1.021 | 0.989 | 2.469 | 1.007 |
Fincance vs Bird Time Series Classification Models
How well can I build a simple classifier?
a)
The financial time series and audio tracks of birds used in this analysis contains records for 974 time series, containing the five predictor variables - including linearity, entropy, x_acf1, covariate1 and covariate2 - which are meant to be used to distinguish between financial time series and audio tracks of birds, with the true classification being stored in the type variable - classifying something as “birdsongs” or “finance”.
A summary of the mean and standard deviation of the two types of time series for the different predictor variables can be seen in Table 2.
Table 2: Summary Statistics for Birdsong and Finance Time Series
As can be seen in Table 2, in regards to:
linearity: finance and birdsongs have extremely different means, with financial time series having much higher mean and variance, makinglinearityvery strong feature for distinguishing the difference in type.entropy: birdsongs have a higher mean and lower variance than financial time series, resulting in more consistent entropy. Overall, there is a clear difference between the two types mean and standard deviation indicating thatentropymay be useful.x_acf1: finance and birdsongs have a very strong difference in mean but similar variance, makingx_acf1a useful feature. (Note that financial time series have a higher autocorrelation on average which makes sense intuitively)covariate1: Almost identical means and standard deviations between finance and birdsongs, makingcovariate1not a useful feature for distinguishing the difference in type.covariate2: finance and birdsongs have a strong difference in mean and similar variance, meaningcovariate2may be quite informative.
Overall, from this quick numerical analysis, it is looking like: linearity, entropy, x_acf1 and covariate2 may be key features in distinguishing between the difference in financial time series and audio tracks of birds while covariate1 does not seem to be useful in helping distinguish the difference.
Figure 1’s density plots help visualise the insights that can be seen in Table 2.
Figure 1: Density Plots by Feature and Type
Figure 1 illustrates:
covariate1: birdsongs and finance have a normal distribution with similar mean and variance.covariate2: birdsongs and finance have a normal distribution with similar variance but a different mean.entropy: finance experiences a bimodal distribution while birdsongs experiences a normal distribution, with different variancelinearity: birdsongs and finance seem to be normally distributed, however, the mean and variance of birdsongs and finance are extremely different.x_acf1: birdsongs and finance have a bimodal distribution with a slightly different variance and mean.
Overall, the two types of time series are mainly distinguishable through linearity, entropy, x_acf1 and covariate2 while extremely not distinguishable by covariate1. This visual conclusion aligns with the numerical conclusion.
The assumptions for linear discriminant analysis (LDA) include:
- the distribution of the predictors is a multivariate normal
- the same variance-covariance matrix
It is clear from Figure 1 that neither of these conditions were met:
- the distributions for some of the variables (such as
x_acf1,entropy) are bimodal and not normally distributed - there is a large variance differences, especially in
linearity.
Consequently, the LDA assumptions do not hold.
b)
The data was broken into a training set and a testing set as seen in the code below. The split was 70/30, meaning 70% of the data went to training, 30% of the data went to testing, and the split made sure that the proportion of birdsongs and finance types are near-equal in both the training and testing split.
# Break the data into training and test samples, appropriately (70/30 split).
financebirds_strata <- financebirds |> initial_split(prop = 0.7, strata = type)
financebirds_train <- training(financebirds_strata)
financebirds_test <- testing(financebirds_strata)c)
Even though the assumptions do not hold, an LDA and logistic regression model was fitted to the training data. Please see the .qmd file if you would like to see the code.
d)
Variable Importance
Regarding the LDA model fit, a summary of the linear discriminant of the different features can be seen in Table 3.
Table 3: LDA Fit Linear Discriminants
| Feature | LD1 |
|---|---|
| linearity | 0.020 |
| entropy | -1.079 |
| x_acf1 | -0.136 |
| covariate1 | 0.041 |
| covariate2 | 0.810 |
According to Table 3, for the LDA model, entropy is the most influential feature, followed closely by covariate2 as the second most influential feature. Meanwhile the other features contribute much less and may not improve classification much - especially x_acf1 and and linearity.
Regarding the logistic regression model fit, a summary of the logistic regression coefficients can be seen in Table 4.
Table 4: Logistic Regression Fit Coefficients
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -1.513 | 0.767 | -1.973 | 0.048 |
| linearity | 0.048 | 0.009 | 5.277 | 0.000 |
| entropy | -2.047 | 0.433 | -4.732 | 0.000 |
| x_acf1 | -0.277 | 0.207 | -1.341 | 0.180 |
| covariate1 | 0.142 | 0.216 | 0.661 | 0.509 |
| covariate2 | 1.416 | 0.124 | 11.408 | 0.000 |
We can see from Table 4 that, for the Logistic Regression model, given their quite high p-value (p > 0.05), it indicates that x_acf1 and covariate1 are not statistically significant, meaning that there is strong evidence that they are not useful predictors. Meanwhile, linearity, entropy and covariate2, with a low p-value, indicate that they are statistically significant - meaning that there is strong evidence that they are useful predictors. Looking at the coefficient estimates, entropy is very strongly negative, while covariate2 is strongly positive and linearity is also strongly positive.
Confusion Matrices and Accuracy
The confusion matrix for the LDA model and the Logistic Regression model on the test set can be seen in Table 5 and Table 6, respectively.
Table 5: LDA Model Confusion Matrix
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 101 | 42 | 0.706 |
| finance | 21 | 129 | 0.860 |
Table 6: Logistic Regression Model Confusion Matrix
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 124 | 19 | 0.867 |
| finance | 32 | 118 | 0.787 |
As can be seen from the confusion matrices in Tables 5 and 6, for the test set, the LDA model is better at predicting financial time series than the Logistic Regression model - correctly predicting finacial time series 79.3% of the time compared to the Logistic Regression model’s 70.7%. Meanwhile the Logistic Regression model is better at predicting audio tracks of birds than the LDA model. In fact, the Logistic Regression model correctly predicts birdsongs 86.7% of the time while the LDA model correctly predicts bird songs 76.2% of the time in the test set.
From the confusion matrices, the accuracy and balanced accuracy for the two different models can be determined.
For the LDA model,
\[accuracy = 78.5\%\] \[balanced \ accuracy = 78.31\%\]
For the Logistic Regression,
\[accuracy = 82.59\%\] \[balanced \ accuracy = 82.69\%\]
As can be seen, the Logistic Regression model has a higher accuracy and balanced accuracy, while, the LDA model has a slightly lower accuracy and balanced accuracy. Overall, if we were to only use this metric, it indicates that the Logistic Regression model was more appropriate out of the two. However, this would not be the best way to choose the most accurate model as these measures are for one cut-off point (by default 0.5). Area Under the Curve (AUC) of a Receiver Operating Characteristic (ROC) evaluates the accuracy for every single cut-off point, making the conclusions drawn from it producing a model with the most accurate probabilities. More information on ROC Curve for these two models is seen in Section 2e).
Figure 2 and Figure 3 illustrate the different models confidence in predicting the correct class.
Figure 2: LDA Model Confidence in Correct Class
Figure 3: Logistic Regression Model Confidence in Correct Class
As can be seen in Figure 2, the LDA model is more confident at classifying financial time series than audio tracks of birds, with a higher median, meanwhile, the Logistic Regression model is highly confident at classifying both with a few incorrectly-looking low confidence values even when correct (as seen in Figure 3).
Mistakes
In total there are 74 unique mistakes made among the LDA model and the Logistic Regression model. A glimpse at the first 10 mistakes can be seen in Table 7.
Table 7: Glimpse at the Mistakes made by the LDA and Logistic model
| lda_pred | logistic_pred | type | same_mistake |
|---|---|---|---|
| finance | birdsongs | birdsongs | FALSE |
| finance | birdsongs | birdsongs | FALSE |
| finance | finance | birdsongs | TRUE |
| finance | birdsongs | birdsongs | FALSE |
| finance | birdsongs | birdsongs | FALSE |
| finance | finance | birdsongs | TRUE |
Breaking this down further, Table 8 outlines a summary of the mistakes made by the two linear models.
Table 8: Summary of the Mistakes Made by LDA and Logistic Regression Model
| lda_total_mistakes | logistic_total_mistakes | same_mistake_count |
|---|---|---|
| 63 | 51 | 40 |
As can be seen in Table 8, there are 50 mistakes done by both LDA and logistic regression model, meaning that 76.92% of LDA model mistakes and 79.37% of Logistic regression model mistakes are based on the same observations. That is a very large chunk.
To better understand why this is a case, Figure 4 and Figure 5 helps us investigate.
Figure 4: LDA Model: Grand Tour and Confusion Scatterplot
Figure 5: Logistic Regression Model: Grand Tour and Confusion Scatterplot
As can be seen in Figure 4 and Figure 5, the grand tours illustrates that it can be confusing to classify the correct type sometimes due to the overlap of the two time series types. Consequently, as can be seen in the confusion scatter plot for both LDA and Logistic Regression (Figure 4 and 9), there are misclassifications for both birdsongs and finance. Interestingly, it looks like the misclassification amount is roughly similar for both types and both models.
e)
The ROC curves for the LDA model and the Logistic Regression model can be seen in Figure 6.
Figure 6: LDA vs Logistic Regression Model ROC Curves
As can be seen in Figure 6, the ROC curve for the two models are very similar, with the LDA ROC curve looking marginally better. To confirm this, we can check the area under the ROC curve (known as AUC), which has been summarised in Table 9.
Table 9: Area Under the ROC Curve for the LDA and Logistic Regression Model
| Model | AUC |
|---|---|
| LDA | 0.898 |
| Logistic Regression | 0.894 |
As can be seen in Table 9, the LDA model had a marginally better AUC value, which supports our visual conclusion made about the ROC curve. The ROC tells us how well the model separates the two classes regardless of a decision threshold. Consequently, out of the two, the LDA model is the better model. However, I want to note that as the AUC for both of the models are so similar, the difference between the two models is minimal, meaning that if one wanted a more simple model with similar accuracy, the Logistic Regression model may be suitable.
f)
From the LDA and Linear Regression model, we can conclude how the time series for financial data and birdsongs typically differ. From both of the fitted models, we saw from Table 3 and Table 4 that the variables entropy and covariate2 are very significant in helping to distinguish between birdsongs and financial data. Additionally, linearity can also be critical in helping distinguish between birdsongs and financial data, with such contrasting means and variance between the two types as seen in Table 2. However, there are around 50 observations that both models found difficult to correctly predict (as outlined in Table 8), contributing to the major drawbacks in helping distinguish between the different time series. When looking at the tour and confusion scatterplot in Figure 4 and 9, we saw that in general the difference between the two types are easily distinguishable, except for a few (~50) points that overlap in some frames of the tour.
Overall, we can see from the fitted model and summary statistics that the time series for financial data and birdsongs typically differ due to entropy, covariate2 and linearity.
Tuning a non-linear classifier
a)
Using the tidymodels style of coding, a “bad” decision tree was fitted to the training data, with a min_n = 1 and cost_complexity = 0. This means we are training a tree with the minimum number of samples required to split being 1, and no penalty for adding extra branches. Consequently, this “bad” decision tree, is going to be extremely overfitted. A plot of this extremely overfitted, “bad” decision tree can be seen in Figure 7.
Figure 7: “Bad” Decision Tree Fit
This “bad” decision tree is so complex and overfitted that the data is extremely hard to read. In fact, there are 73 terminal nodes, and 145 branches in this “bad” decision tree. This tree shows some deep splits and many branches indicating that it did a good job ‘memorising’ patterns in the training data.
In investigating the importance of the variables for this “bad” decision tree, a summary table can be seen in Table 10.
Table 10: Variable Importance for the “Bad” Decision Tree
| x | |
|---|---|
| linearity | 164.4 |
| covariate2 | 130.1 |
| entropy | 107.2 |
| x_acf1 | 80.9 |
| covariate1 | 21.0 |
As can be seen in Table 10, linearity and covariate2 are extremely significant in helping distinguish the type, while entropy and x_acf1 are also significant, and covariate1 being the least significant.
Table 11 and Table 12 both give us insights into how well the “bad” decision tree performed on the training data set and the testing data set, respectively.
Table 11: Confusion Matrix on Training Data for “Bad” Decision Tree
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 331 | NA | 1 |
| finance | NA | 350 | 1 |
Table 12: Confusion Matrix on Testing Data for “Bad” Decision Tree
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 127 | 16 | 0.888 |
| finance | 16 | 134 | 0.893 |
As can be seen in Table 11, the “bad” decision tree ending up perfectly guessing/fitting every single observation in the training set - a clear sign of overfitting. Meanwhile, when looking at the test set confusion matrix seen in Table 12, the “bad” decision tree, didn’t end up performing as “bad” as one might expect, with it correctly predicting birdsongs 87.4% of the time and finance 84.7% of the time. Although the model is overfitted, it actually seemed to perform okay.
On the training set:
\[accuracy = 100\%\] \[balanced \ accuracy = 100\%\]
On the test set:
\[accuracy = 89.08\%\] \[balanced \ accuracy = 89.07\%\]
As can be seen, regarding the training set, as expected it achieved a perfect score of 100% for accuracy and balanced accuracy due to overfitting the data. Surprisingly on the test set it continued to still perform alright, with an accuracy of 89.08% and a balanced accuracy of 89.07%. This overfitted, “bad” decision tree actually has a higher accuracy and balanced accuracy on the test set than the LDA or Logistic Regression model. (Again note that for a more correct analysis, to compare the models the ROC curves must be compared, as done in Section 3C).
b)
Using the capabilities in tidymodels, the optimal tree parameters - tree_depth, min_n, cost_complexity - were determined. The code used to determined the optimal tree parameters can be seen below.
# Define the tunable decision tree spec
tune_spec <-
decision_tree(
cost_complexity = tune(),
tree_depth = tune(),
min_n = tune()
) |>
set_engine("rpart") |>
set_mode("classification")
# Create a grid of parameters to tune
tree_grid <- grid_regular(
cost_complexity(),
tree_depth(),
min_n(),
levels = 5
)
# Set up cross-validation folds
set.seed(234)
financebirds_folds <- vfold_cv(financebirds_train, v = 5, strata = type)
# Create a workflow
tree_wf <- workflow() |>
add_model(tune_spec) |>
add_formula(type ~ linearity + entropy + x_acf1 + covariate1 + covariate2)
# Perform the tuning
set.seed(345)
tree_res <- tree_wf |>
tune_grid(
resamples = financebirds_folds,
grid = tree_grid,
metrics = NULL # Computes a standard set of metrices
)
# View and summarise results
tree_res |> collect_metrics() |> slice_head(n = 6)
# Find the best combination (based on AUC)
tree_top_5 <- tree_res |> show_best(metric = "roc_auc")
tree_best <- tree_res |> select_best(metric = "roc_auc")
# Finalize workflow and fit on training data
tuned_tree_wf <- tree_wf |> finalize_workflow(tree_best)
tuned_tree_fit <- tuned_tree_wf |> fit(data = financebirds_train)To determine the most optimal tree parameters, I defined a tune-able decision tree model specification with cost complexity, tree depth, and minimum node size as hyperparameters. I then created a regular grid of parameter combinations (with five levels for each hyperparameter) and implemented five-fold cross-validation. Each model in the grid was trained and evaluated using default classification metrics. After tuning, I collected and analysed the performance metrics and selected the best hyperparameter combination based on ROC AUC.
The top 5 best performing tuned decision tree models, based on ROC AUC, can be seen in Table 13.
Table 13: Top 5 Tuned Decision Tree Models Ranked by ROC AUC Performance
| cost_complexity | tree_depth | min_n | .metric | .estimator | mean | n | std_err | .config |
|---|---|---|---|---|---|---|---|---|
| 1.00e-10 | 11 | 21 | roc_auc | binary | 0.946 | 5 | 0.004 | Preprocessor1_Model066 |
| 1.78e-08 | 11 | 21 | roc_auc | binary | 0.946 | 5 | 0.004 | Preprocessor1_Model067 |
| 3.16e-06 | 11 | 21 | roc_auc | binary | 0.946 | 5 | 0.004 | Preprocessor1_Model068 |
| 5.62e-04 | 11 | 21 | roc_auc | binary | 0.946 | 5 | 0.004 | Preprocessor1_Model069 |
| 1.00e-10 | 15 | 21 | roc_auc | binary | 0.946 | 5 | 0.004 | Preprocessor1_Model071 |
As can be seen in Table 13, the ROC AUC mean value is similar for the top 5, however, what makes the first model (seen in the first row) superior is the combination of a low cost_complexity and lower tree_depth.
Consequently, the hyperparameters that will lead to the best model can be seen in Table 14.
Table 14: Best Tuned Decision Tree Hyperparameters Model Based on ROC AUC
| cost_complexity | tree_depth | min_n |
|---|---|---|
| 1e-10 | 11 | 21 |
c)
Using the optimal hyperparameters, min_n = 30, tree_depth = 8 and cost_complexity = 1e-10, the tuned decision tree was fitted to the training data. This means we are training a tree with the minimum number of samples required to split being 30, the maximum number of branches from the root being 8 and a very small penalty for adding extra branches. Consequently, this “tuned” decision tree, should not overfit the data like the “bad” decision tree. A plot of this tuned decision tree can be seen in Figure 8.
Figure 8: Tuned Decision Tree Fit
As can be seen in Figure 8, the tuned decision tree is somewhat complex, but relative to the “bad” decision tree seen in Figure 7, it is not even close. In fact, in the tuned decision tree, there are 15 terminal nodes, and 14 branches.
In investigating the importance of the variables for this tuned decision tree, a summary table can be seen in Table 15.
Table 15: Variable Importance for the Tuned Decision Tree
| x | |
|---|---|
| linearity | 132.69 |
| covariate2 | 120.70 |
| entropy | 81.67 |
| x_acf1 | 70.17 |
| covariate1 | 2.77 |
Exactly the same as for the ‘bad’ decision tree, we can see from Table 15 that the linearity and covariate2 are extremely significant in helping distinguish the type for the tuned decision tree, while entropy and x_acf1 are also significant, and covariate1 being the least significant.
Table 16 and Table 17 both give us insights into how well the tuned decision tree performed on the training data set and the testing data set, respectively.
Table 16: Confusion Matrix on Training Data for Tuned Decision Tree
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 306 | 25 | 0.924 |
| finance | 26 | 324 | 0.926 |
Table 17: Confusion Matrix on Testing Data for Tuned Decision Tree
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 126 | 17 | 0.881 |
| finance | 17 | 133 | 0.887 |
As can be seen in Table 16, the tuned decision tree ending up correctly predicting birdsongs 93.7% of the time and finance 90.3% of the time in the training set - unlike the overfitted model that achieved 100% in the training set. When looking at the test set confusion matrix seen in Table 17, the tuned decision tree correctly predicting birdsongs 93.0% of the time and finance 82.7% of the time. Overall, this is quite a good result.
On the training set:
\[accuracy = 92.51\%\] \[balanced \ accuracy = 92.51\%\]
On the test set:
\[accuracy = 88.4\%\] \[balanced \ accuracy = 88.39\%\]
As can be seen, regarding the training set, the accuracy and balanced accuracy are very high and on the test set it continued to still perform quite well, with an accuracy of 88.4% and a balanced accuracy of 88.39%. This tuned decision tree has a higher accuracy and balanced accuracy on the test set than the “bad” decision tree, LDA or Logistic Regression model.
Now in terms of which model is the most accurate and the best choice, Figure 9 illustrates the ROC Curve for the “bad” decision tree and the tuned decision tree.
Figure 9: ROC Curves for Bad vs Tuned Decision Trees
As can be seen in Figure 9, the tuned decision tree has a much better ROC curve than the “bad” decision tree. Additionally, the “bad” decision tree ROC curve seems to follow a two linear piecewise function. Now checking the area under the ROC curve, a summary can be seen in Table 18.
Table 18: Area Under Curve (AUC) for the different decision tree models.
| Model | AUC |
|---|---|
| Bad Decision Tree | 0.891 |
| Tuned Decision Tree | 0.934 |
Regarding the ROC Curve and the AUC, the “bad” decision tree performed worse than LDA or Logistic Regression Model - having a lower AUC. However, the tuned decision tree had a much better ROC Curve and AUC value than the “bad” decision tree, LDA and Logistic Regression model. Therefore, in practice I would use the tuned decision tree over any of the other three models, as it is reasonably more accurate. This decision makes sense because, given the data set, a tuned decision tree will definitely outperform the linear models as one clear linear separation is not apparent when looking at the data (see Figure 4 and 9 Grand Tours to visualise it).
Which is the better classifier?
a)
A Random Forest model was developed to distinguish between financial time series and audio tracks of birds. Using the randomForest engine with 1000 trees, I tuned mtry and min_n through a 5-fold stratified cross-validation. I created a regular grid over the ranges mtry = 1 to 5 and min_n = 2 to 20, using 5 levels for each parameter. I selected the best hyperparameters for the model based on ROC AUC and finalised the workflow before fitting it to the full training set.
Note: I also played around with using the Ranger engine, however, randomForest seemed to perform better.
The top 5 best performing tuned random forest models, based on ROC AUC, can be seen in Table 19.
Table 19: Top 5 Tuned Random Forest Models Ranked by ROC AUC Performance
| mtry | min_n | .metric | .estimator | mean | n | std_err | .config |
|---|---|---|---|---|---|---|---|
| 5 | 15 | roc_auc | binary | 0.947 | 5 | 0.005 | Preprocessor1_Model20 |
| 3 | 15 | roc_auc | binary | 0.947 | 5 | 0.006 | Preprocessor1_Model18 |
| 5 | 11 | roc_auc | binary | 0.947 | 5 | 0.005 | Preprocessor1_Model15 |
| 4 | 11 | roc_auc | binary | 0.947 | 5 | 0.005 | Preprocessor1_Model14 |
| 4 | 15 | roc_auc | binary | 0.947 | 5 | 0.006 | Preprocessor1_Model19 |
As can be seen in Table 19, the ROC AUC mean value is similar for the top 5. Interestingly, we can see from row 4 that you can achieve extremely high ROC AUC with just mtry = 2 and min_n = 15, which indicates that there are really only two variables that do most of the distinguishing (which is not a surprise, when looking at our previous analysis). Table 20 illustrates the variable importance for the random forest.
Table 20: Variable Importance for the Random Forest
| MeanDecreaseGini | |
|---|---|
| linearity | 121.43 |
| entropy | 18.04 |
| x_acf1 | 32.69 |
| covariate1 | 6.46 |
| covariate2 | 113.47 |
As can be seen in Table 20, and as seen the other models in this analysis, for a random forest model, linearity and covariate2 are extremely significant in helping distinguish the type, while entropy and x_acf1 are less significant, and covariate1 is near negligibly significant. Interestingly, covariate2 is slightly higher than linearity for the random forest model, which is not the case for the tuned decision tree, “bad” decision tree, LDA and Linear Regression model. Consequently, for the random forest model, covariate2 is the most important variable in distinguishing the time series type.
Overall, the hyperparameters that will lead to the best random forest model can be seen in Table 21.
Table 21: Best Tuned Random Forest Parameters Model Based on ROC AUC
| mtry | min_n |
|---|---|
| 5 | 15 |
Table 22 and Table 23 both give us insights into how well the random forest model performed on the training data set and the testing data set, respectively.
Table 22: Confusion Matrix on Training Data for Random Forest Model
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 314 | 17 | 0.949 |
| finance | 18 | 332 | 0.949 |
Table 23: Confusion Matrix on Testing Data for Random Forest Model
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 134 | 9 | 0.937 |
| finance | 19 | 131 | 0.873 |
As can be seen in Table 22, the random forest model ending up correctly predicting birdsongs 94.3% of the time and finance 93.1% of the time in the training set. When looking at the test set confusion matrix seen in Table 23, the random forest model correctly predicting birdsongs 93.7% of the time and finance 84.0% of the time. Overall, this is quite a good result and the confusion matrix on the test set is looking better for the random forest model than the other models so far. Note that throughout, and even for the random forest model, the models incorrectly predict birdsong for time series that are finance more often than vice versa.
On the training set:
\[accuracy = 94.86\%\] \[balanced \ accuracy = 94.86\%\]
On the test set:
\[accuracy = 90.44\%\] \[balanced \ accuracy = 90.52\%\]
As can be seen, regarding the training set, the random forest model’s accuracy and balanced accuracy are reasonably high (higher than the tuned decision tree model on the training set). For the test set, the random forest model also achieves a high accuracy and balanced accuracy. This random forest model has a higher accuracy and balanced accuracy on the test set than the tuned decision tree, “bad” decision tree, LDA or Logistic Regression model.
Although random forest is doing very well, there still seems to be errors. Even when playing around with the hyperparameters and trying to train a more complex random forest, the results don’t improve that much. To better understand why, Table 24 illustrates the probability the random forest had for each type and the vote difference, arranged in order from the smallest vote to largest.
Table 24: Random Forest Predictions Sorted by Vote Difference
| .pred_birdsongs | .pred_finance | vote_diff | predicted_class | actual |
|---|---|---|---|---|
| 0.480 | 0.520 | 0.040 | finance | birdsongs |
| 0.521 | 0.479 | 0.042 | birdsongs | birdsongs |
| 0.523 | 0.477 | 0.046 | birdsongs | birdsongs |
| 0.525 | 0.475 | 0.050 | birdsongs | birdsongs |
| 0.475 | 0.525 | 0.050 | finance | finance |
| 0.470 | 0.530 | 0.060 | finance | birdsongs |
| 0.531 | 0.469 | 0.062 | birdsongs | birdsongs |
| 0.546 | 0.454 | 0.092 | birdsongs | birdsongs |
| 0.570 | 0.430 | 0.140 | birdsongs | finance |
| 0.572 | 0.428 | 0.144 | birdsongs | birdsongs |
| 0.578 | 0.422 | 0.156 | birdsongs | birdsongs |
| 0.418 | 0.582 | 0.164 | finance | birdsongs |
| 0.582 | 0.418 | 0.164 | birdsongs | birdsongs |
| 0.582 | 0.418 | 0.164 | birdsongs | birdsongs |
| 0.585 | 0.415 | 0.170 | birdsongs | birdsongs |
| 0.593 | 0.407 | 0.186 | birdsongs | finance |
| 0.594 | 0.406 | 0.188 | birdsongs | finance |
| 0.397 | 0.603 | 0.206 | finance | finance |
| 0.389 | 0.611 | 0.222 | finance | finance |
| 0.615 | 0.385 | 0.230 | birdsongs | birdsongs |
As can be seen in Table 24, there are 8 observations that the random forest model is struggles to clearly distinguish (vote_diff < 20%). Additionally, within the top 20 observations with the smallest vote difference shown, there are 7 incorrect predictions. Given that in total there are 33 incorrect predictions, and only 7 seen in this small set, it indicates that the variables may not be clearly distinguishable enough to achieve much higher accuracy than what is achieved and explains why there still seems to be errors.
b)
A boosted tree model was developed to distinguish between financial time series and audio tracks of birds. Using the xgboost engine with 1000 trees, I tuned mtry, min_n and tree_depth through a 5-fold stratified cross-validation. I created a regular grid over the ranges mtry = 1 to 5, min_n = 2 to 20 and tree_depth = 1 to 10, using 5 levels for each parameter. I selected the best hyperparameters for the model based on ROC AUC and finalised the workflow before fitting it to the full training set.
Note: I also played around with tuning other hyperparameters such as sample_size, learn_rate, loss_reduction, however, they increased compute time drastically and the added accuracy was very minimal - making it less ideal. As such, mtry, min_n and tree_depth were the only hyperparameters tuned for this final analysis.
The top 5 best performing tuned boosted tree models, based on ROC AUC, can be seen in Table 25.
Table 25: Top 5 Tuned Boosted Tree Models Ranked by ROC AUC Performance
| mtry | min_n | tree_depth | .metric | .estimator | mean | n | std_err | .config |
|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 1 | roc_auc | binary | 0.946 | 5 | 0.006 | Preprocessor1_Model002 |
| 1 | 2 | 1 | roc_auc | binary | 0.945 | 5 | 0.005 | Preprocessor1_Model001 |
| 5 | 6 | 1 | roc_auc | binary | 0.945 | 5 | 0.005 | Preprocessor1_Model010 |
| 3 | 6 | 1 | roc_auc | binary | 0.945 | 5 | 0.005 | Preprocessor1_Model008 |
| 4 | 2 | 1 | roc_auc | binary | 0.945 | 5 | 0.005 | Preprocessor1_Model004 |
As can be seen in Table 25, the ROC AUC mean value is similar for the top 5, with the main difference in the hyperparameters being the mtry value. Noticeably, an tree_depth = 1 for all top 5, indicating a very shallow tree, however, a min_n = 2 is also consistent, indicating there must be at least 2 observations to be split. Finally, for the top model seen in the first row, mtry=1, indicating that only 1 predictor should be randomly selected at each split.
Table 26 illustrates the contribution metrics for the different variables for the boosted tree model.
Table 26: Feature Contribution Metrics for the Boosted Tree Model
| Feature | Gain | Cover | Frequency |
|---|---|---|---|
| covariate2 | 0.351 | 0.205 | 0.203 |
| linearity | 0.314 | 0.166 | 0.148 |
| x_acf1 | 0.151 | 0.213 | 0.212 |
| entropy | 0.148 | 0.215 | 0.223 |
| covariate1 | 0.037 | 0.201 | 0.214 |
As can be seen in Table 26, linearity is the most important feature in the boosted tree model, contributing to nearly 38.0% of the overall gain, followed by covariate2 which also contributes 30.4% of the overall gain. In the boosted tree model, we continue to see that entropy contributes very little, and x_acf1 and covariate1 negligibly improve the model’s performance. Although all variables is frequently used to split, there is a clear pattern on which variables improve the model the most, which has been somewhat consistent throughout this analysis.
Overall, the hyperparameters that will lead to the best boosted tree model can be seen in Table 27.
Table 27: Best Tuned Boosted Tree Parameters Model Based on ROC AUC
| mtry | min_n | tree_depth |
|---|---|---|
| 2 | 2 | 1 |
Table 28 and Table 29 both give us insights into how well the boosted tree performed on the training data set and the testing data set, respectively.
Table 28: Confusion Matrix on Training Data for Boosted Tree
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 319 | 12 | 0.964 |
| finance | 15 | 335 | 0.957 |
Table 29: Confusion Matrix on Testing Data for Boosted Tree
| type | birdsongs | finance | cl_acc |
|---|---|---|---|
| birdsongs | 135 | 8 | 0.944 |
| finance | 20 | 130 | 0.867 |
As can be seen in Table 28, the boosted tree ending up correctly predicting birdsongs 97.3% of the time and finance 95.4% of the time in the training set - which is higher than the Random Forest (potentially indicating that it may be starting to overfit as it approaches 100%). When looking at the test set confusion matrix seen in Table 29, the boosted tree correctly predicting birdsongs 94.4% of the time and finance 86.0% of the time. Overall, this is quite a good result and the confusion matrix on the test set is looking quite good compared to the other models. Note that throughout, and even for the boosted tree model, the models incorrectly predict birdsong for time series that are finance more often than vice versa.
On the training set:
\[accuracy = 96.04\%\] \[balanced \ accuracy = 96.04\%\]
On the test set:
\[accuracy = 90.44\%\] \[balanced \ accuracy = 90.54\%\]
As can be seen, regarding the training set, the accuracy and balanced accuracy are very high and on the test set it continued to still perform quite well, with an accuracy of 90.1% and a balanced accuracy of 90.2%. This boosted tree model has a higher accuracy and balanced accuracy on the test set than the random forest, tuned decision tree, “bad” decision tree, LDA or Logistic Regression model.
To better understand why the boosted tree model is incorrectly predicting the type on the test set, Table 30 illustrates the probability the boosted tree model had for each type and the vote difference, arranged in order from the smallest vote to largest.
Table 30: Boosted Tree Predictions Sorted by Vote Difference
| .pred_birdsongs | .pred_finance | vote_diff | predicted_class | actual |
|---|---|---|---|---|
| 0.500 | 0.500 | 0.001 | birdsongs | finance |
| 0.496 | 0.504 | 0.009 | finance | birdsongs |
| 0.485 | 0.515 | 0.031 | finance | finance |
| 0.528 | 0.472 | 0.056 | birdsongs | finance |
| 0.542 | 0.458 | 0.083 | birdsongs | finance |
| 0.547 | 0.453 | 0.094 | birdsongs | birdsongs |
| 0.453 | 0.547 | 0.095 | finance | finance |
| 0.443 | 0.557 | 0.115 | finance | finance |
| 0.565 | 0.435 | 0.130 | birdsongs | finance |
| 0.579 | 0.421 | 0.158 | birdsongs | birdsongs |
| 0.581 | 0.419 | 0.162 | birdsongs | finance |
| 0.398 | 0.602 | 0.204 | finance | finance |
| 0.612 | 0.388 | 0.224 | birdsongs | finance |
| 0.645 | 0.355 | 0.291 | birdsongs | birdsongs |
| 0.665 | 0.335 | 0.330 | birdsongs | birdsongs |
| 0.328 | 0.672 | 0.343 | finance | finance |
| 0.328 | 0.672 | 0.343 | finance | birdsongs |
| 0.675 | 0.325 | 0.351 | birdsongs | birdsongs |
| 0.677 | 0.323 | 0.354 | birdsongs | finance |
| 0.320 | 0.680 | 0.360 | finance | finance |
Comparing Table 30 to that of the random forest model (Table 24), we see that there are now 11 observations that the boosted tree struggles to clearly distinguish (vote_diff < 20%) - more than the random forest. Additionally, within the top 20 observations with the smallest vote difference shown, there are 11 incorrect predictions. Given that in total there are 29 incorrect predictions, and only 11 seen in this small set, it indicates that the variables may not be clearly distinguishable enough to achieve much higher accuracy than what is achieved and explains why there still seems to be errors in the boosted tree model. Additionally, from Table 30 we can see from this that the boosted tree model is much less confident in it’s predictions than the random forest model having more closer prediction probabilities than the random forest model.
c)
Figure 14 illustrates the ROC Curve for the random forest model and the boosted tree model.
Figure 14: ROC Curves for Random Forest vs Boosted Tree Model
As can be seen in Figure 14, the ROC for both models are similar with the boosted tree ROC curve seeming to perform better better. It is good to confirm this by checking the area under each curve - with a summary provided in Table 31.
Table 31: Area Under Curve (AUC) for the Random Forest and Boosted Tree Models
| Model | AUC |
|---|---|
| Random Forest | 0.965 |
| Boosted Tree | 0.971 |
Regarding the ROC Curve and the AUC, both of these models perform very well. Compared to the tuned decision tree, “bad” decision tree, LDA and Logistic Regression model’s ROC Curve and AUC, both of these perform better. Out of the two, the boosted tree model seems to be better than random forest model according to ROC Curve and AUC. Consequently, as it has the best ROC Curve and AUC, the boosted tree model is the current best model for distinguishing between financial time series and audio tracks of birds.
d)
Choice of Best Model
Overall, six different models were analysed and investigated including: Logistic Regression, LDA, “bad” decision tree, tuned decision tree, random forest and boosted tree model. Out of these six models, the models with the best ROC Curve, sorted from highest AUC to lowest, are boosted tree model, random forest, tuned decision tree, LDA, Logistic Regression, “bad” decision tree model. The LDA and Logistic Regression model are close to one another, while the tuned decision tree and the random forest are also reasonably close to one another. It makes sense though that the boosted tree model achieved a higher ROC, as each tree learns and improves on the mistakes of previous ones, unlike random forests that builds an ensemble of independent trees. Hence, speaking from a ROC Curve only, I would have to say that the best model choice is the boosted tree model - with the overall largest AUC OF 96.7%.
However, if we wanted to consider other factors such as model complexity and speed of training, I would argue that the tuned decision tree model is the most appropriate. The tuned decision tree model, although achieving third highest AUC, was able to be very quickly trained while still maintaining a high AUC of 94.6%. Both the random forest and the boosted tree models took much longer, and for a trade of of up to 2.1% of the AUC for speed compared to the boosted tree model and 0.4% of the AUC compared to the random forest model, it may be worth it.
Therefore, if I wanted a quick and rather simple to train model that still has high accuracy, I would choose the tuned decision tree model. But if time did not matter and I simply wanted to most accurate model, then I would choose the boosted tree model, even considering tuning further hyperparameters (which would drastically increase training time even more but may result in a slight gain in accuracy).
How the Time Series Typically Differ
Overall, it was found that the time series for financial data and birdsongs typically differed on two main variables. From the “bad” decision tree, tuned decision tree, random forest and boosted tree models, linearity and covariate2 were the two most significant variables in helping distinguish between birdsongs and financial time series. Even in the LDA and Logistic Regression models, these two variables had some significance. Additionally, it was learnt that covariate1 is not useful in distinguishing between the two types of data, indicating that it could probably be neglected in future.
Overall, no matter how good they are, there are still observations that the model cannot appropriately distinguish - due to their similarity - making it hard for a human to even correctly predict it. Overall, we saw that we are more likely to incorrectly finance as a birdsong than vice versa, so keeping this in mind when drawing on conclusions from any model is important.
References
Arnold, J. B. (2012). ggthemes: Extra Themes, Scales and Geoms for ‘ggplot2’. R package version 5.1.0. Available at: https://CRAN.R-project.org/package=ggthemes
Canty, A., & Ripley, B. D. (2021). boot: Bootstrap R (S-Plus) Functions. Available at: https://CRAN.R-project.org/package=boot
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., … & Yuan, J. (2025). xgboost: Extreme Gradient Boosting. R package version 3.0.0.1. Available at: https://github.com/dmlc/xgboost
Cheng, J., Xie, Y., Wickham, H., Chang, W., & McPherson, J. (2023). crosstalk: Inter-Widget Interactivity for HTML Widgets. Available at: https://CRAN.R-project.org/package=crosstalk
Garnier, S., Ross, N., Rudis, B., Sciaini, M., Camargo, A. P., & Scherer, C. (2023). viridisLite: Colorblind-Friendly Color Maps (Lite Version). R package version 0.4.2. Available at: https://CRAN.R-project.org/package=viridisLite
Hart, C., & Wang, E. (2022). detourr: Portable and Performant Tour Animations. Available at: https://CRAN.R-project.org/package=detourr
Hvitfeldt, E., Silge, J., Kuhn, M., & Vaughan, D. (2023). discrim: Model Wrappers for Discriminant Analysis. Available at: https://CRAN.R-project.org/package=discrim
Kassambara, A. (2023). ggpubr: ‘ggplot2’ Based Publication Ready Plots. R package version 0.6.0. Available at: https://rpkgs.datanovia.com/ggpubr/
Kuhn, M., Wickham, H., & Weston, S. (2020). Tidymodels: A collection of packages for modeling and machine learning using tidyverse principles. Available at: https://www.tidymodels.org
Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News, 2(3), 18–22. Available at: https://CRAN.R-project.org/package=randomForest
Milborrow, S. (2024). rpart.plot: Plot ‘rpart’ Models: An Enhanced Version of ‘plot.rpart’. R package version 3.1.2. Available at: https://CRAN.R-project.org/package=rpart.plot
Pedersen, T. L. (2025). patchwork: The Composer of Plots. R package version 1.3.0.9000. Available at: https://patchwork.data-imaginist.com/
Schloerke, B., Cook, D., Larmarange, J., Briatte, F., Marbach, M., Thoen, E., Elberg, A., & Crowley, J. (2024). GGally: Extension to ‘ggplot2’. R package version 2.2.1. Available at: https://CRAN.R-project.org/package=GGally
Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC. Available at: https://plotly-r.com
Wickham, H., Cook, D., Hofmann, H., & Buja, A. (2011). tourr: An R Package for Exploring Multivariate Data with Projections. Journal of Statistical Software, 40(2), 1–18. Available at: http://www.jstatsoft.org/v40/i02/
Wickham, H., François, R., Henry, L., & Müller, K. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. DOI: https://doi.org/10.21105/joss.01686
Wickham, H., Hester, J., & Bryan, J. (2024). readr: Read Rectangular Text Data. R package version 2.1.5. Available at: https://readr.tidyverse.org
Xie, Y. (2025). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.50. Available at: https://yihui.org/knitr/
Zhu, H. (2024). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. Available at: https://CRAN.R-project.org/package=kableExtra